FG: A Framework Generator for Hiding Latency in Parallel Programs Running on Clusters

نویسندگان

  • Thomas H. Cormen
  • Elena Riccio Davidson
چکیده

FG is a programming environment for asynchronous programs that run on clusters and fit into a pipeline framework. It enables the programmer to write a series of synchronous functions and represents them as stages of an asynchronous pipeline. FG mitigates the high latency inherent in interprocessor communication and accessing the outer levels of the memory hierarchy. It overlaps separate pipeline stages that perform communication, computation, and I/O by running the stages asynchronously. Each stage maps to a thread. Buffers, whose sizes correspond to block sizes in the memory hierarchy, traverse the pipeline. FG makes such pipeline-structured parallel programs easier to write, smaller, and faster. FG offers several advantages over statically scheduled overlapping and dynamically scheduled overlapping via explicit calls to thread functions. First, it reduces coding and debugging time. Second, we find that it reduces code size by approximately 15–26%. Third, according to experimental results, it improves performance. Compared with programs that use static scheduling, FG-generated programs run approximately 61–69% faster on a 16-node Beowulf cluster. Compared with programs that make explicit calls for dynamically scheduled threads, FG-generated programs run slightly faster. Fourth, FG offers various design options and makes it easy for the programmer to explore different pipeline configurations. ∗Contact author. Supported in part by DARPA Award W0133940 in collaboration with IBM and in part by National Science Foundation Grant IIS-0326155 in collaboration with the University of Connecticut. †Supported in part by DARPA Award W0133940 in collaboration with IBM. To appear in the 17th International Conference on Parallel and Distributed Computing Systems (PDCS-2004). Copyright c © 2004, The International Society for Computers and Their Applications, Inc.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The FG Programming Environment: Reducing Source Code Size for Parallel Programs Running on Clusters

FG is a programming environment designed to reduce the source code size and complexity of out-of-core programs running on clusters. Our goals for FG are threefold: (1) make these programs smaller, (2) make them faster, and (3) reduce time-to-solution. In this paper, we focus on the first metric: the efficacy of FG for reducing source code size and complexity. We designed FG to fit programs, inc...

متن کامل

Tackling Latency Using Fg

Applications that operate on datasets which are too big to fit in main memory, known in the literature as external-memory or out-of-core applications, store their data on one or more disks. Several of these applications make multiple passes over the data, where each pass reads data from disk, operates on it, and writes data back to disk. Compared with an in-memory operation, a disk-I/O operatio...

متن کامل

Network Interface Active Messages for Low Overhead Communication on SMP PC Clusters

NICAM is a communication layer for SMP PC clusters connected via Myrinet, designed to reduce overhead and latency by directly utilizing a micro-processor equipped on the network interface. It adopts remote memory operations to reduce much of the overhead found in message passing. NICAM employs an Active Messages framework for exibility in programming on the network interface, and this exibility...

متن کامل

CUDA-For-Clusters: A System for Efficient Execution of CUDA Kernels on Multi-core Clusters

Rapid advancements in multi-core processor architectures along with low-cost, low-latency, high-bandwidth interconnects have made clusters of multi-core machines a common computing resource. Unfortunately, writing good parallel programs to efficiently utilize all the resources in such a cluster is still a major challenge. Programmers have to manually deal with low-level details that should idea...

متن کامل

Evaluating SPLASH-2 Applications Using MapReduce

MapReduce has been prevalent for running data-parallel applications. By hiding other non-functionality parts such as parallelism, fault tolerance and load balance from programmers, MapReduce significantly simplifies the programming of large clusters. Due to the mentioned features of MapReduce above, researchers have also explored the use of MapReduce on other application domains, such as machin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004